Theory and Techniques for Synthesizing a Family of Graph 

Algorithms 



Although Breadth-First Search (BFS) has several advantages over Depth-First Search (DFS) its pro- 
hibitive space requirements have meant that algorithm designers often pass it over in favor of DFS. 
To address this shortcoming, we introduce a theory of Efficient BFS (EBFS) along with a simple 
recursive program schema for carrying out the search. The theory is based on dominance relations, 
a long standing technique from the field of search algorithms. We show how the theory can be used 
to systematically derive solutions to two graph algorithms, namely the Single Source Shortest Path 
problem and the Minimum Spanning Tree problem. The solutions are found by making small sys- 
tematic changes to the derivation, revealing the connections between the two problems which are 
often obscured in textbook presentations of them. 

1 Introduction 

Program synthesis is experiencing something of a resurgence |[2T1 |20l |4l lfT4l l22l following negative 
perceptions of its scalability in the early 90s. Many of the current approaches aim for near-automated 
synthesis. In contrast, the approach we follow, we call guided program synthesis, also incorporates a 
high degree of automation but is more user-guided. The basic idea is to identify interesting classes of 
algorithms and capture as much generic algorithm design knowledge as possible in one place.The user 
instantiates that knowledge with problem-specific domain information. This step is often carried out with 
machine assistance. The approach has been applied to successfully derive scores of efficient algorithms 
for a wide range of practical problems including scheduling ifTSl . concurrent garbage collection [131, and 
SAT solvers lfT9l . 

One significant class of algorithms that has been investigated is search algorithms. Many interesting 
problems can be solved by application of search. In such an approach, an initial search space is par- 
titioned into subspaces, a process called splitting, which continues recursively until a. feasible solution 
is found. A feasible solution is one that satisfies the given problem specification. Viewed as a search 
tree, spaces form nodes, and the subspaces after a split form the children of that node. The process 
has been formalized by Smith |[T5llT7ll . Problems which can be solved by global search are said to be 
in the Global Search (GS) class. The enhancements in GS over standard branch-and-bound include a 
number of techniques designed to improve the quality of the search by eliminating unpromising avenues. 
One such technique is referred to as dominance relations. Although they do not appear to have been 
widely used, the idea of dominance relations goes back to at least the 70s Q. Essentially, a dominance 
relation is a relation between two nodes in the search tree such that if one dominates the other, then the 
dominated node is guaranteed to lead to a worse solution than the dominating one, and can therefore be 
discarded. Establishing a dominance relation for a given problem is carried out by a user. However this 
process is not always obvious. There are also a variety of ways in which to carry out the search, for 
example Depth-First (DFS), Breadth-First (BFS), Best-First, etc. Although DFS is the most common. 
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BFS actually has several advantages over DFS were it not for its exponential space requirement. The key 
to carrying out BFS space-efficiently is to limit the size of the frontier at any level. However, this has not 
been investigated in any systematic manner up to now. 
This paper has two main contributions: 

• We show how to limit the size of the frontier in search using dominance relations, thereby enabling 
space-efficient BFS. From this formal characterization, we derive a characteristic recurrence that 
serves as the basis of a program schema for implementing Global Search. Additionally, we show 
that limiting the size of the undominated frontier to one results in a useful class of greedy algo- 
rithms. 

• We show how to derive dominance relations and demonstrate they satisfy the greediness conditions 
for two graph problems, namely Single Source Shortest Path and Minimum Spanning Tree by a 
systematic process, which though not automatic, we believe has the potential to be automated. 

2 Background To Guided Program Synthesis 
2.1 Process 

The basic steps in guided program synthesis are: 

1. Start with a logical specification of the problem to be solved. A specification is a quadruple 
{D,R,o,c) where D is an input type, R an output or result type, o : D x /? is a predicate relating 
correct or feasible outputs to inputs, and c : D x R ^ Int is a. cost function on solutions. An example 
specification is in Eg. [T](This specification is explained in more detail below) 

2. Pick an algorithm class from a library of algorithm classes (Global Search, Local Search, 
Divide and Conquer, Fixpoint Iteration, etc). An algorithm class comprises a program 
schema containing operators to be instantiated and an axiomatic theory of those operators (see 191 
for details). A schema is analogous to a template in Java/C-i-i- , with the difference that both the 
template and template arguments are formally constrained. 

3. Instantiate the operators of the program schema using information about the problem domain and 
in accordance with the axioms of the class theory. To ensure correctness, this step can be carried 
out with mechanical assistance. The result is an efficient algorithm for solving the given problem. 

4. Apply low-level program transforms such as finite differencing, context-dependent simplification, 
and partial evaluation, followed by code generation. Many of these are automatically applied by 
Spec ware [IJ, a formal program development environment. 

The result of Step 4 is an efficient program for solving the problem which is guaranteed correct by con- 
struction. The power of the approach stems from the fact that the common structure of many algorithms 
is contained in one reusable program schema and associated theory. Of course the program schema needs 
to be carefully designed, but that is done once by the library designer. The focus of this paper is the 
Global Search class, and specifically on how to methodically carry out Step 3 for a wide variety of 
problems. Details of the other algorithm classes and steps are available elsewhere iTTl lTSlfTSl . 
Example 1. Specification of the Single Pair Shortest Path (SPSP) problem is shown in Fig. 12. II (The 
reads as "instantiates to") The input D is a structure with 3 fields, namely a start node, end node and a 
set of edges. The result /? is a sequence of edges ([...] notation). A correct result is one that satisfies the 
predicate pathl which checks that a path z must be a contiguous path from the start node to the end node 
( simple recursive definition not shown). Finally the cost of a solution is the sum of the costs of the edges 
in that solution. Note that fields of a structure are accessed using the '.' notation. 
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2.2 Global Search 

Before delving into a program schema for Global 

Search, it helps to understand the structures over D ^ {s : Node,e : Node, edges : {Edge}) 
which the program schema operates. In [15], a Edge = (/ : Node,t : Node,w : Nat) 

search space is represented by a descriptor of R [Edge] 
some type ^, which is an abstraction of the result o X{x,z.) • pathl{z,x.s,x.e) 
type R. The initial or starting space is denoted path7{p,s,f) = ... 

_L. There are also two predicates split. D xRxR, c ^ X{x,z) ■ 'Ledgeey edge.w 
written rti, and extract: RxR, written X- Split de- 
fines when a space is a subspace of another space, Figure 2. 1 : Specification of Shortest Path problem 
and extract captures when a solution is extractable 

from a space. We say a solution z is contained in a space y (written z G y) if it can be extracted af- 
ter a finite number of splits. A feasible space is one that contains feasible solutions. We often write 
(ti {x,y,y') as y rtlv y' for readability, and even drop the subscript when there is no confusion. Global 
Search theory (GS-theory) iffSl axiomatically characterizes the relation between the predicates _L, rti 
and as well as ensuring that the associated program schema computes a result that satisfies the 
specification. In the sequel, the symbols R,±,&i,x,® are all assumed to be drawn from GS-theory. 
A theory for a given problem is created by instantiating these terms, as shown in the next example. 
Example 2. Instantiating GS-theory for the Single Pair Shortest Path problem. The type of solution 
spaces R is the same as the result type A space is split by adding an edge to the current path - that is 
the subspaces are the different paths that result from adding an edge to the parent path. Finally a solution 
can be trivially extracted from any space by setting the result z to the space p. This is summarized in Fig. 
([] denotes the empty list, and +-|- denotes list concatenation). 



2.3 Dominance Relations 

As mentioned in the introduction, a dominance re- 
^ I-). lation provides a way of comparing two subspaces 

± Xx- [] in order to show that one will always contain at 

iti ^ X{x,p,p') ■ Beex.edges- least as good a solution as the other. (Goodness 

p' = /7-H- [e] in this case is measured by some cost function on 

^ X{z,p) ■ p = z solutions). The first space is said to dominate (>) 

the second, which can then be eliminated from the 
Figure 2.2: GS instantiation for Single Pair Shortest search. Letting c* denote the cost of an optimal 
Path solution in a space, this can be formalized as (all 

free variables are assumed to be universally quan- 
tified): 

y>y'^c*{x,y)<c*{x,y') (2.1) 
Another way of expressing the consequent of (12.11 ) is 

Vz' G y • o{x,z) =^3z^y- o{x,z) Ac{x,z) < c{x,z) (2.2) 

To derive dominance relations, it is often useful to first derive a semi-congruence relation 115]. A 
semi-congruence between two partial solutions y and y', written y~^y', ensures that any way of extending 



'there is a covariant relationship between an element of R and of R. For example, the initial space, corresponding to all 
possible paths, is the empty list. 
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y' into a feasible solution can also be used to extend y into a feasible solution. Like rh, ~^ is a ternary 
relation over D x RxR but as we have done with rti and many other such relations in this work, we drop 
the input argument when there is no confusion and write it as a binary relation for readability. Before 
defining semi-congruence, we introduce two concepts. One is the idea of useability of a space. A space y 
is is useable, written o*{x,y), if 3z.x{y,z) f\o{x,z), meaning a feasible solution can be extracted from the 
space. The second is the notion of incorporating sufficient information into a space to make it useable. 
This is defined by an operator @ : Rxt ^ R that takes a space and some additional information of type 
t and returns a more defined space. The type t depends on R. For example if R is the type of lists, then t 
might also be the same type. Now the formal definition of semi-congruence is: 

y -^y' ^ o*{x,y' ®e) ^ o*{x,y®e) 

That is, 3^ 3^' is a sufficient condition for ensuring that if y' can be extended into a feasible solution 
than so can y with the same extension. If c is compositional (that is, c{s ®t) = c{s) + c{t)) then it can be 
shown ||9l that iiy ~^y' and y is cheaper than y' , then y dominates y' (written y \> y'). Formally: 

y -^y ^c{x,y) <c{x,y')^y>y' (2.3) 

The axioms given above extend GS -theory lITSll . 

Example 3. Single Pair Shortest Path. If there are two paths p and p' leading from the start node, if p 
and p' both terminate in the same node then p p'. The reason is that any path extension e (of type 
t = [Edge]) of p' that leads to the target node is also a valid path extension for p. Additionally if p is 
shorter than p' then p dominates p', which can be discarded. Note that this does not imply that p leads 
to the target node, simply that no optimal solutions are lost in discarding p'. This dominance relation is 
formally derived in Eg. [8] 

Example 4. 0- 1 Knapsack 

The 0-1 Knapsack problem is, given a set of items each of which has a weight and utility and a 
knapsack that has some maximum weight capacity, to pack the knapsack with a subset of items that 
maximizes utility and does not exceed the knapsack capacity. Given combinations k,k', if k and k' have 
both examined the same set of items and k weighs less than k' then any additional items e that can be 
feasibly added to k' can also be added to k, and therefore k ~^ k'. Additionally if k has at least as much 
utility as k' then k\> k' . 

The remaining sections cover the original contributions of this paper. 

3 A Theory Of Efficient Breadth-First Search (EBFS) 

While search can in principle solve for any computable function, it still leaves open the question of how 
to carry it out effectively. Various search strategies have been investigated over the years; two of the 
most common being Breadth-First Search (BFS) and Depth-First Search (DFS). It is well known that 
BFS offers several advantages over DFS. Unlike DFS which can get trapped in infinite path^ BFS will 
always find a solution if one exists. Secondly, BFS does not require backtracking. Third, for deeper trees, 
BFS will generally find a solution at the earliest possible opportunity. However, the major drawback of 
BFS is its space requirement which grows exponentially. For this reason, DFS is usually preferred over 
BFS. 

^resolvable in DFS with additional programming effort 
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Our first contribution in tliis paper is to refine GS-tlieory to identify the conditions under which a 
BFS algorithm can operate space-efficiently. The key is to show how the size of the undominated frontier 
of the search tree can be polynomially bounded. Dominance relations are the basis for this. 

In lITSll . the relation iti' for / > is recursively defined as follows: 

= {y=y') 
jrh'+iy = 3y"-y(\]y"Ay"(t)'y' 

From this the next step is to define those spaces at a given frontier level that are not dominated. However, 
this requires some care because dominance is a pre-order, that is it satisfies the reflexivity and transitivity 
axioms as a partial order does, but not the anti-symmetry axiom. That is, it is quite possible for y to 
dominate y' and y' to dominate y but y and y' need not be equal. An example in Shortest Path is two paths 
of the same length from the start node that end at the same node. Each path dominates the other. To 
eliminate such cyclic dominances, define the relation y ^ y' as y \> y' Ay' \> y. It is not difficult to show 
that « is an equivalence relation. Now let the quotient frontier at level / be the quotient set frontieri/ « 
. For type consistency, let the representative frontier rfrontieri be the quotient frontier in which each 
equivalence class is replaced by some arbitrary member of that class. The representative frontier is the 
frontier in which cyclic dominances have been removed. Finally then the undominated frontier undomi 
is rfrontieri — {y \ 3y' G rfrontieri ■ y' t> y}. 

Now given a problem in the GS class, if it can be shown that \\undomi\\ for any / is polynomially 
bounded in the size of the input, a number of benefits accrue: (1) BFS can be used to tractably carry 
out the search, as implemented in the raw program schema of Alg. [T] (2) The raw schema of Alg. [T] 
can be transformed into an efficient tail recursive form, in which the entire frontier is passed down and 
(3) If additionally the tree depth can be polynomially bounded (which typically occurs for example in 
constraint satisfaction problems or CSPs [31) then, under some reasonable assumptions about the work 
being done at each node, the result is a polynomial-time algorithm for the problem. 



3.1 Program Theory 

A program theory for EBFS defines a recursive function which given a space y, computes a non-trivial 
subset Fx{y) of the optimal solutions contained in y, where 

Px{y) = optc{z I Z € J Ao(x,z)} 

optc is a subset of its argument that is the optimal set of solutions (w.r.t. the cost function c), defined as 
follows: 

opt,S = {z 1 z e 5 A (Vz' € 5 • c(z) < c(z'))} 

Also let undom{y) be undomn^y-^j^i n {yy \ y iti yy} where l{y) is the level of y in the tree. The following 
proposition defines a recurrence for computing the feasible solutions in a space: 

Proposition 5. Let {D,R,R,o,c,-L,&\,x,>,(B) be a well-founded GS-Theory w. r. t. the subspace relation 
(ti and let F^iy) = {z \ z €z y A o{x,z)} be the set of feasible solutions contained in y and Gx(y) = {z | 
X{y,z) Ao{x,z)}LI\Jyyff^yGxiyy)} be a recurrence. Then G^iy) = F^iy) for any y 

Proof SeeHSl. □ 
Finally he following theorem defines a recurrence that can be used to compute FOx{y)- 
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Theorem 6. Let (hbe a well-founded relation ofGS-theory and let GO^iy) = optc{z. \ ;|^(3',z) Ao(x,z)} U 
\Jyyeundom{y) GO^x{yy)) be a recurrence. Then GOx{y) Q FO^{y) 

Proof. By generalized induction. The base case is those spaces which do not have subspaces. Then 

GOx{y) = optc{z I x{y,z)^o{x,z)}. By Prop.[5]{z | x{y,z) ^o{x,z)} = {z \ z£y^o{x,z)]. The inductive 
case is as follows: 

= {defn} 

optc{z I Z Ao(x,z)} 
= {defn of Fx} 
opmiy)) 

= {F,{y) = G,{y) by ProplS]} 
opt{{z I Xiy,z) Ao{x,z)}U\J,,^y,,Gx{yy)) 
= {GM = Fxiyy) by PropE][ 
opt{{z I X{y,z) Ao{x,z)}u[jyt\,yyFxiyy)) 
= {distributivity of opt} 

opt{opt{z I X{y,z) ^o{x,z)}^ opt{{jyfyyyFx{yy))) by 
= {distributivity and idempotence of opt} 

opt{{z\x{y.z) ^o{x,z)}y^ Uv^.yy (^^ ^yy))) 

= {unfold defn of F^, fold defn of FOx} 
opt{{z I x{y,z) ^o{x,z)}y^{^y^yyFOx{yy)) 

2 {yy G undomiy) ^y (\] yy} 

opt{{z \ X{y,z) /\o{x,z)}U[Jy) FOx{yy)) 

D {induction hypothesis:FOv(3'3') 5 GOx{yy)} 

Opt{{z I X{y.z) A o(x,z)} U {jyyeundom(y) GOx{yy)) 

= {fold defn of GOx} 
GOxiy) 

□ 

The theorem states that if the feasible solutions immediately extractable from a space y are combined 
with the solutions obtained from GOx of each undominated subspace yy, and the optimal ones of those 
retained, the result is a subset of FOx{y). The next theorem demonstrate non-trivialit}H of the recurrence 
by showing that if a feasible solution exists in a space, then one will be found. 
Theorem 7. Let &\ be a well-founded relation of GS-Theory and GOx be defined as above. Then 

F0x{y)7^(d^G0x{y)^(d 

Proof. The proof of Theorem|6]is a series of equalities except for two steps. It is sufficient to show that 
both of these steps preserve non-triviality. The proof is again by induction over the subspace relation. 
The first refinement reduces [jyff,yyFOx{yy) to [jyyeundom{y) F Ox{yy) . Suppose 3yy-y&\yyAF0x{yy) / 0. 
If yy£ undom(y) then we are done. Otherwise if yy is dominated, then there is some yy' > yy and by the 
property of dominance, FOx{yy') ^ 0, so [jyyeundom{y) FOx{yy) / 0- The second refinement follows again 
by induction, using the induction hypothesis FOx{yy) / ^ GOx{yy) 7^ 0- □ 

From the characteristic recurrence we can straightforwardly derive a simple recursive function bf s 
to compute a non-trivial subset of Fx for a given 3^, shown in Alg. [T] 

■'Non-triviality is similar but not identical to completeness. Completeness requires that every optimal solution is found by 
the recurrence, which we do not guarantee. 
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Algorithm 1 pseudo-Haskell Program Schema for EBFS (schema parameters underhned) 

solve : : D -> {R} 

solve (x) = bfs X {initial (x)} 

bf s : : D -> { RHat }-> {R} 
bfs X frontier = 

let localsof y = let z = extract x y 

in if z!={} M o(x,z) then z else {} 
locals = (flatten. map) localsof frontier 
allsubs = (flatten. map) (subspaces x) frontier 
undom = {yy : yy£allsubs && 

(yy'Gsubs && yy' 'dominates' yy =^ yy==yy')} 
subsolns = bfs X undom 
in opt (locals U subsolns) 

subspaces : : D -> RHat -> {RHat} 
subspaces x y = {yy: split (x,y,yy) ) 

opt : : {R} -> {R} 

opt zs = min {c x z I z Gzs} 



The final program schema that is included in the Specware library is the result of incorporating a 
number of other features of GS such as necessary filters, bounds tests, and propagation, which are not 
shown here. Details of these and other techniques are in IfTSl . 

3.2 A class of strictly greedy algorithms (SG) 

A greedy algorithm m is one which repeatedly makes a locally optimal choice. For some classes of 
problems this leads to a globally optimum choice. We can get a characterization of optimally greedy 
algorithms within EBFS by restricting the size of undomi for any / to 1. If undomj ^ then the singleton 
member y* of undomi is called the greedy choice. In other work | [T2[ we show how to derive greedy 
algorithms for a variety of problems including Activity Selection, One machine scheduling. Professor 
Midas' Traveling Problem, Binary Search. 

4 Methodology 

We strongly believe that every formal approach should be accompanied by a methodology by which it 
can be used by a competent developer, without needing great insights. Guided program synthesis already 
goes a long way towards meeting this requirement by capturing design knowledge in a reusable form. The 
remainder of the work to be done by a developer consists of instantiating the various parameters of the 
program schema. In the second half of this paper, we demonstrate how carrying this out systematically 
allows us to derive several related graph algorithms, revealing connections that are not always obvious 
from textbook descriptions. We wish to reiterate that once the dominance relation and other operators 
in the schema have been instantiated, the result is a complete solution to the given problem. We focus 
on dominance relations because they are arguably the most challenging of the operators to design. The 
remaining parameters can usually be written down by visual inspection. 

The simplest form of derivation is to reason backwards from the conclusion of j -w o*{x,y' © 
e) o*{x,y®e), while assuming o*{x,y' ®e) . The additional assumptions that are made along the way 
form the required semi-congruence condition. The following example illustrates the approach. 
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o*{x,y®e) 

= {defn of o*} 

3z- X{y®e,z) /\o{x,z) 

= {defn of 

o{x,y®e) 

= {defn of o} 

pathl{y © e^x.s^x.e) 

= {distributive law for path?} 

3n ■ pathl{y,x.s,n) A pathl{e,n,x.e) 

{o*(x,y ©e), ie.3m • path7{y' ,x.s,m) A pathl{e,m,x.e). Let m be witness for n} 
pathl{y,x.s,m) A pathl{e,m,x.e) 

= {m = last{y).t, (where last returns the last element of a sequence)} 
last{y).t = last{y').t Apathl{y,x.s,n) 

Figure 4.1: Derivation of semi-congruence relation for Single Pair Shortest Path 

Example 8. Derivation of the semi-congruence relation for Single Pair Shortest Path in Eg. [1] is a 
straightforward calculation as shown in Fig 14.11 It relies on the specification of Shortest Path given in 
Eg. [Hand the GS-theory in Eg. El 

The calculation shows that a path y is semi-congruent to y' if y and y' both end at the same node and 
additionally y is itself a valid path from the start node to its last node. Since the cost function is compo- 
sitional, this immediately produces a dominance relation y \> y' = last{y) = last{y') A pathl{y,x.s,n) A 
Y,edgeey^d.ge.w < Y^edge'ey' edge' .w. Notc the use of the distributive law for pathl in step 4. Such laws are 
usually formulated as part of a domain theory during a domain discovery process, or even as part of the 
process of trying to carry out a derivation such as the one just shown. Given an appropriate constructive 
prover (such as the one in KIDS [ 16|) such a derivation could in fact be automated. Other examples that 
have been derived using this approach are Activity Selection [11 J, Integer Linear Programming 1,15,1 . and 
variations on the Maximum Segment Sum problem [10]. 

While this dominance relation could in principle be used to computer Single Source Shortest Path 
using a Best-First search (such as A*) it would not be very efficient as every pair of nodes on the frontier 
would need to be compared. In the next section, a more powerful dominance relation is derived which can 
be used in a Breadth-First search, and even more importantly, be shown to be in the SG class, resulting in 
a very efficient greedy algorithm. The dominance relation just derived is still utilized, but in a subsidiary 
role. 

5 More complex derivations: A family of related algorithms 
5.1 (Single Source) Shortest Path 

Previously in Eg. [8] we derived a dominance relation for the (undirected) single pair shortest path prob- 
lem. To solve the problem of finding all shortest paths from a given start node to every other node in 
the graph it is convenient to consider the output as a set of edges that form what is called a path tree, a 
subgraph of the input graph which forms a spanning tree rooted at the start node. The desired output is a 
path tree in which every path from the root is the shortest. The specification of Single Pair Shortest Path 
in Fig. 12. II is revised as shown in Fig. 15. II The revised instantiation of Global Search theory is shown 
in Fig. 15.21 In what follows, the extends operator © is shown by simple concatenation. The goal is to 
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D I— > {s : Node, edges : {Edge}) 

Edge = {a : Node,b : Node,w : Nat) 
R ^ {Edge} 

o ^ X{x,z) • connected{x,z) /\acyclic{x,z) 

C ^ ^{x.,z) 'T,pepathsFrom(x.s)^'{p) 

c'{p)=Ledgeep(^dge.w 

Figure 5.1: Specification of Shortest Path problem 

R ^ R 

± ^ Xx-{} 

(ti i-> X{x,p,pe) ■ 3e G x.edges ■ pe = pU {e} 
X ^ ^{z,p)-p = z 

© u 

Figure 5.2: GS instantiation for Shortest Path 

show that there is at most one undominated child following a split of a partial solution a. Let ae and 
ae' be two children following a split of a, that is the graphs a with edge e added and that with e' added. 
Without loss of generality (w.l.o.g.) assume neither e nor e' are already contained in a and both connect 
to a. Let z' = cce'co' be a feasible solution derived from ae'. The task is to construct a feasible solution 
z from ae and discover the conditions under which it is cheaper than z'- We will use the definition of 
general dominance (12.21 ). repeated here for convenience: 

Vz' G y • o{x,z) => 3z G J - o{x,z) Ac{x,z) < c{x,z) 

Establishing o{aeO)) requires connected{ae(o) and acyclic{ae(o). 

In guided program synthesis, it is often useful to write down laws flF] that will be needed during the 
derivation. Some of the most useful laws are distributive laws and monotonicity laws. For example the 
following distributive law applies to ayclic : 

acyclic {a = acyclic (a) A acyclic {(5) Aac(a,j3) 

where ac defines what happens at the "boundary" of a and j8: 

ac{a,l5) = ym,n - 3p ^ a* ■ pathl(p,m,n) =^ -<3q G j3* • pathl{q,m,n) 

requiring that if is a path in a (a* is a regular expression denoting all possible sequences of edges in 
a) connecting m and n then there should be no path between m and « in j8. Examples of monotonic- 
ity laws are acyclic{ap) acyclic{a) and connected{a) f\ connect ed{^) Anodes{a) nnodes{p) 7^ 
=^ connected {a p). By using the laws constructively, automated tools such as KIDS |[T6l can often 
suggest instantiations for terms. For instance, after being told connect ed {a e' co'), the tool could apply 
the monotonicity law for connected to suggest connect ed{aee' co'), that is to try ft) = e'(o' . With this 
binding for (O we can attempt to establish acyclic{ae(o), by expanding its definition. One of the terms 
is . . .ac{e,ae' co') . . . which fails because conn{ae' (o') implies pathl{ae' co' ,e.a,e.b) so adding edge e 
creates a cycle. The witness to the failure is some path n' = ej...ek where Cj.a = e.aAek-b = e.b. 
One possibility is that ej = e^ = e, that is co' contains e. If so, let co' = e^' for some xj/'. Then 
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Figure 5.3: Feasible solution ae'(o' 

z' = ae'co' = ae'eY' = cxee'^r' . Let CO = e'^r' and now z = z' ■ Otherwise, w.l.o.g assume that e con- 
nects with a at e.a, and therefore so does Cj, so the case ej.b = e.b is not very interesting. The only 
option then is to remove edge ek from ft)' . Let ft)' = ek'^f' and so (O is e'^f' . Now the requirement 
becomes 

acyclic{ae) Aacyclic{e Aac{ae,e\j/') 

acyclic{e'Y') follows from acyclic{ae'Y') by monotonicity and ac{ae,e'Y') follows from acyclic{ae'Y') 
and the fact that e was chosen above to remove the cycle in aee'co' . This demonstrates o{aeco) provided 
acyclic{ae) where (O is constructed as above. 

Finally, to establish general dominance it is necessary to show that z costs no more than z! . Note that 
the cost of a solution is the sum of individual path costs starting from x.s. Let m denote e.a and n denote 
e.b (and analogously for e'). Now consider a path to a node p in z' ■ If the path to p does not contain edge 
ek ie. pass through n then the same path holds in z- Otherwise let p'e'jye"5 be a path to p in z' where e" 
is the edge ek above (see Fig. 15.31 ). j3' is a path from x.s in a and is some edge that leads out of a on 
the path to p. Then the corresponding path in z is j3e5 (see Fig. 15.41 ). 

c{ae(0,pe5) <c{ae'(o',P'eiYe"5) 
= {expand defns} 

c{z,pe) + c{z, 6) < c{z',l5'ei) + c{z',e") + c{z, 5) 
= {+ve edge weights, triangle inequality} 
c{z,l5e)<c{z\P'ei) 

As there were no restrictions on e' above, let e,- be the witness for e' and this establishes That is, provided 
the path ^e is shorter than jS'e', there cannot be a shorter path via e' to n. As cost is the sum of the path 
costs, it follows that c{x, ae(o) < c{ae'(o'). The dominance condition is then ae > ae' <^= acycliciae) A 
c{z,pe) < c{z' ,li'e') . Finally, as it is not known at the time of the split which e' will lie on the path to e.b, 
to be conservative, let e be that edge whose endpoint e.b is the closest to the start node. This is therefore 
the greedy choice. lincorporating finite differencing to incrementally maintain the distances to the nodes, 
using the dominance relation derived earlier for Single Pair Shortest Path to eliminate the longer paths to 
a node, and data structure refinement results in an algorithm similar to Dijkstra's algorithm for MSTs. 
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Figure 5.4: Feasible solution ae(0 
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Figure 5.5: Specification of Min. Spanning Tree problem 
5.2 Minimum Spanning Tree 

The specification of MST is very similar to that of Shortest Path, with the difference that there is no 
longer a distinguished node s, the input graph must be connected, and the cost of a solution is simply 
the sum of the weights of the edges in the tree The instantiation of the simple GS operators is as for SP. 
Again, any edge that is added must not create a cycle or it cannot lead to a feasible solution. We will 
describe the algorithm construction process informally so as to expose the connection with the Shortest 
Path algorithm more clearly. However the derivations shown here can also be similarly formalized. 

There are now two ways to satisfy the acyclicity requirement. One is by choosing an edge connecting 
a node in a to one outside of a. Another is to choose an edge that connects two nodes within a, being 
careful not to create cycles. The two options are examined next. 

Option 1: Let z! = ae'co' be a feasible solution derived from ae' . If (o' includes e then let co in a 
feasible solution z = (xeco simply be (o' — {e} U {e'} and then z = z!. Otherwise, if co' does not contain 
e there must be some other path connecting a with e.t. W.l.o.g. assume that path is via e'. If ae'co' is 
feasible, then it is a tree, so a) ' is also a tree. Therefore it is not difficult to show that z = ae(o' is also a 
spanning tree. Now to show dominance, derive conditions under which z is cheaper than z': 

c{x, ae(o') < c{x, ae'co') 
= {defn of c} 

'Ledge€ae(o' edge.W < Ledgeeae' w' ^dge.W 
e.w < e'.w 
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Finally, as it is not known at the time of the spUt which e' will lie on the path to e.t, to be conservative, 
let e be that edge with the least weight connecting a with an external node . This is therefore the greedy 
choice. The result is an algorithm that is similar to Prim's algorithm for MSTs. 

Option 2: The difference with Option 1 is in how e is chosen in order to ensure acyclicity. For a 
feasible solution, a must not contain any cycles. Therefore it consists of a collection of acyclic connected 
components, ie trees. Any new edge cannot connect nodes within a component without introducing a 
cycle. Therefore it must connect two component trees. Connecting two trees by a single edge results in 
a new tree. As in Option 1, let z' = cce'co' be a feasible solution derived from ae'. If ft)' includes e then 
let ftj in a feasible solution z = cceco simply be co' — {e} U {e'} and then z = z' ■ Otherwise, if ftj' does 
not contain e there must be some other edge used to connect the two trees that e would have connected. 
W.l.o.g. assume that edge is e' . If ae' O)' is feasible, then it is a tree, so ftj' is also a tree. Therefore it is 
not difficult to show that z = cceco' is also a spanning tree. The derivation of a cost comparison relation 
is identical to Option 1, and once again the greedy choice is the edge e that connects two trees and is of 
least weight. The result of this option is an algorithm that is similar to Kruskal's algorithm. 

In conclusion, we observe that far from being completely different algorithms, Dijkstra's algorithm. 
Prim's algorithm and Kruskal's algorithm differ only in very small number of (albeit important) ways. In 
contrast, many textbook descriptions of the algorithms introduce the algorithms out of the blue, followed 
by separate proofs of correctness. We have shown how a systematic procedure can derive different 
algorithms, with relatively minor changes to the derivations. 

6 Related Work 

Gulwani et al. |[2T1 |4l describe a powerful program synthesis approach called template-based synthesis. 
A user supplies a template or outline of the intended program structure, and the tool fills in the details. A 
number of interesting programs have been synthesized using this approach, including Bresenham's line 
drawing algorithm and various bit vector manipulation routines. A related method is inductive synthesis 
||6l in which the tool synthesizes a program from examples. The latter has been used for inferring spread- 
sheet formulae from examples. All the tools rely on powerful SMT solvers. The Sketching approach of 
Solar-Lezama et al |T4] also relies on inductive synthesis. A sketch, similar in intent to a template, is 
supplied by the user and the tool fills in such aspects as loop bounds and array indexing. Sketching relies 
on efficient SAT solvers. To quote Gulwani et al. the benefit of the template approach is that "the pro- 
grammer only need write the structure of the code and the tool fills out the details" 11211 .Rather than the 
programmer supplying an arbitrary template, though, we suggest the use of a program schema from the 
appropriate algorithm class (refer to Step 2 of the process in Sec. 12.11 ). We believe that the advantage of 
such an approach is that, based on a sound theory, much can already be inferred at the abstract level and 
this is captured in the theory associated with the algorithm class. Furthermore, knowledge of properties 
at the abstract level allows specialization of the program schema with information that would otherwise 
have to either be guessed at by the programmer devising a template or inferred automatically by the tool 
(e.g. tail recursive implementation or efficient implementation of dominance testing with hashing). We 
believe this will allow semi-automated synthesis to scale up to larger problems such as constraint solvers 
(SAT, CSP, LP, MIP, etc.), planning and scheduling, and 0/S level programs such as garbage collectors 

m. 

Program verification is another field that shares common goals with program synthesis - namely a 
correct efficient program. The difference lies in approach - we prefer to construct the program in a way 
that is guaranteed to be correct, as opposed to verifying its correctness after the fact. Certainly some 
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recent tools such as Dafny [81 provide very useful feedback in an IDE during program construction. 
But even such tools requires significant program annotations in the form of invariants to be able to 
automatically verify non-trivial examples such as the Schorr- Waite algorithm |8|. Nevertheless, we do 
not see verification and synthesis as being necessarily opposed. For example, ensuring the correctness of 
the instantiation of several of the operators in the program schema which is usually done by inspection 
is a verification task, as is ensuring correctness of the schema that goes in the class library. We also feel 
that recent advances in verification via SMT solvers will also help guided synthesis by increasing the 
degree of automation. 

Refinement is generally viewed as an alternative to synthesis. A specification is gradually refined 
into an efficient executable program. Refinement methods such as Z and B have proved to be very pop- 
ular. In contrast to refinement, guided program synthesis already has the program structure in place, 
and the main body of work consists of instantiating the schema parameters followed by various program 
transformations many of which can be mechanically applied. Both refinement and synthesis rely exten- 
sively on tool support, particularly in the form of provers. We expect that advances in both synthesis and 
refinement will benefit the other field. 

7 Summary and Future Work 

We have formulated a theory of efficient breadth-first search based on dominance relations. A very useful 
specialization of this class occurs when there is at most one undominated child node. This is the class of 
Strictly Greedy algorithms. We have also derived a recurrence from which a simple program schema can 
be easily constructed. We have shown how to systematically derive dominance relations for a family of 
important graph algorithms revealing connections between them that are obscured when each algorithm 
is presented in isolation. 

Nearly all the derivations shown in this paper have been carried out by hand. However, they are 
simple enough to be automated. We plan on building a prover that incorporates the ideas mentioned 
in here. We are encouraged by the success of a similar prover that was part of KIDS, a predecessor to 
Specware. 
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